Efficient and effective KNN sequence search with approximate n-grams
نویسندگان
چکیده
منابع مشابه
Efficient and Effective KNN Sequence Search with Approximate n-grams
In this paper, we address the problem of finding k-nearest neighbors (KNN) in sequence databases using the edit distance. Unlike most existing works using short and exact ngram matchings together with a filter-and-refine framework for KNN sequence search, our new approach allows us to use longer but approximate n-gram matchings as a basis of KNN candidates pruning. Based on this new idea, we de...
متن کاملOptimized Transform Coding for Approximate KNN Search
Transform coding (TC) is an efficient and effective vector quantization approach where the resulting compact representation can be the basis for a more elaborate hierarchical framework for sub-linear approximate search. However, as compared to the state-of-the-art product quantization methods, there is a significant performance gap in terms of matching accuracy. One of the main shortcomings of ...
متن کاملPopularity-Based Ranking for Fast Approximate kNN Search
Similarity searching has become widely available in many on-line archives of multimedia data. Users accessing such systems look for data items similar to their specific query object and typically refine results by re-running the search with a query from the results. We study this issue and propose a mechanism of approximate kNN query evaluation that incorporates statistics of accessing index da...
متن کاملImproving KNN Arabic Text Classification with N-Grams Based Document Indexing
Text classification is the task of assigning a document to one or more of pre-defined categories based on its contents. This paper presents the results of classifying Arabic language documents by applying the KNN classifier, one time by using N-Gram namely unigrams and bigrams in documents indexing, and another time by using traditional single terms indexing method (bag of words) which supposes...
متن کاملIndexing Text with Approximate q-Grams
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the inde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2013
ISSN: 2150-8097
DOI: 10.14778/2732219.2732220